Goto

Collaborating Authors

 policy distance


JSON-Bag: A generic game trajectory representation

arXiv.org Artificial Intelligence

--We introduce JSON Bag-of-T okens model (JSON-Bag) as a method to generically represent game trajectories by tokenizing their JSON descriptions and apply Jensen-Shannon distance (JSD) as distance metric for them. Using a prototype-based nearest-neighbor search (P-NNS), we evaluate the validity of JSON-Bag with JSD on six tabletop games-- 7 Wonders, Dominion, Sea Salt and Paper, Can't Stop, Connect4, Dots and boxes--each over three game trajectory classification tasks: classifying the playing agents, game parameters, or game seeds that were used to generate the trajectories. Our approach outperforms a baseline using hand-crafted features in the majority of tasks. Evaluating on N-shot classification suggests using JSON-Bag prototype to represent game trajectory classes is also sample efficient. Additionally, we demonstrate JSON-Bag ability for automatic feature extraction by treating tokens as individual features to be used in Random Forest to solve the tasks above, which significantly improves accuracy on underperforming tasks. Finally, we show that, across all six games, the JSD between JSON-Bag prototypes of agent classes highly correlates with the distances between agents' policies. Defining features and representations for games and their corresponding distance/similarity metric is foundational for any task that requires game analysis. Designing agents to play a game in a certain way (either to optimize playing strength [1], model human players [2], or optimize playstyle diversity [3]) often requires hand-crafted features using domain knowledge. Automated game design and content generation requires defining game metrics to evaluate generated solutions [4]. In these tasks, instead of only optimizing for the targeted fitness functions, optimizing also for diversity and novelty in the solution population can produce better results [5] [3]. Diversity in the population is usually enforced by either defining behavior criteria that partition the search space [6] or using a distance metric to evaluate the novelty of new solutions [5].


Measuring Policy Distance for Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Diversity plays a crucial role in improving the performance of multi-agent reinforcement learning (MARL). Currently, many diversity-based methods have been developed to overcome the drawbacks of excessive parameter sharing in traditional MARL. However, there remains a lack of a general metric to quantify policy differences among agents. Such a metric would not only facilitate the evaluation of the diversity evolution in multi-agent systems, but also provide guidance for the design of diversity-based MARL algorithms. In this paper, we propose the multi-agent policy distance (MAPD), a general tool for measuring policy differences in MARL. By learning the conditional representations of agents' decisions, MAPD can computes the policy distance between any pair of agents. Furthermore, we extend MAPD to a customizable version, which can quantify differences among agent policies on specified aspects. Based on the online deployment of MAPD, we design a multi-agent dynamic parameter sharing (MADPS) algorithm as an example of the MAPD's applications. Extensive experiments demonstrate that our method is effective in measuring differences in agent policies and specific behavioral tendencies. Moreover, in comparison to other methods of parameter sharing, MADPS exhibits superior performance.